DATA COMPRESSING APPARATUS AND 
DATA DECODING APPARATUS 



BACKGROUND OF THE INVENTION 

Field of the Invention 

The invention relates to a data compressing apparatus and 

a data decoding apparatus in which data compressing/decoding method 

such as LZSS or the like is expanded. 

Related Background Art 

A technique for transmitting data via a network in a short 

time has been progressed in association with the recent spread of the 

Internet. A data compression method intends to accomplish such an 
object by reducing the data itself to be transmitted. 

In such a compression method, among lossless data 
compression methods, an origin of a dictionary coding which is most 
frequently used at present is a Lempel-Ziv encoding method created by 
Abraham Lempel and Jacob Ziv in 1977 and it is called a slide 
dictionary coding, LZ77, or the like. According to the LZ77, an input 
character data string which has been encoded before is used as a 
dictionary and the longest coincidence with a continuous portion in the 
input character data string is searched, thereby encoding coincidence 
information. Since the LZ77 has the null-pointer and the first 
character which is outputted after the matching can be also used for the 
next matching, there is no need to outputdn such a case, so that there is 
a slight redundancy. After that, as a method of improving it, Storer 
and Szymanski created a data compression method called LZSS. 
Various techniques using such LZSS have been proposed (for example, 



refer to JP-A-5-241777, JP-A-7-273667, and JP-A-2000-315954). 

The LZSS is a method whereby only when the character 
string is longer than a predetermined pointer length, the pointer is 
outputted and, in the other cases, character data is outputted. 
Therefore, according to encoding in the LZSS, since the data in a form 
such that the encoded character data and pointer are mixed is 
outputted, a specific bit (an extra ID-bit) is used to distinguish the 
character data and the pointer which are being outputted. 

It has been known that when a compression ratio of the 
LZSS is compared with that of another compression method, the 
compression ratio to, particularly, character data having regularity is 
high. There is also a feature such that its decoding is performed at a 
high speed. For example, a large effect is obtained particularly in the 
case where the pointer is encoded by a method combined with the other 
entropy methods. 

However, since the conventional data compression method 
is constructed by the character data and the pointer, it is difficult to 
embed data other than the character data, for example, information 
such as password or copyright information. Since the conventional 
data compression method is constructed only by the character data and 
the pointer, there is a limitation in the case of accomplishing a more 
excellent compression ratio in the range of those means. 

SUMMARY OF THE INVENTION 

It is an object of the invention to output raw data, a pointer, 
and a command from a data string in order to solve the foregoing 
problems and realize a data compressing apparatus which can perform 



free compression. That is, when the data string as a target of the 
command coincides with data string as a target of the raw data and a 
data string indicated by the pointer or a combination of them, the 
command is outputted by replacing it with the raw data or the pointer. 

According to the present invention, there is provided a data 
compressing apparatus for outputting raw data, a pointer, and a 
command from a data string, comprising 

substituting means which outputs the command in place of 
the raw data or the pointer when a data string as a target of the 
command coincides with a data string as a target of the raw data and a 
data string indicated by the pointer or a combination of them. 

The data compressing apparatus may further comprise 
substituting means which, when a code length assigned to the command 
is shorter than a length of data comprising the raw data and the pointer, 
replaces the raw data or the pointer serving as a source with the 
command. 

In the data compressing apparatus, the command is 
constructed by a command code and an extension code. 

Further, in the data compressing apparatus, the extension 
code is constructed by a code indicative of a kind of command and an 
operand. 

Moreover, in the data compressing apparatus, a first bit is 
used to distinguish the raw data, the pointer, and the command, and a 
second bit is used to distinguish the pointer and the command. In the 
case, as first occasion, the data compressing apparatus may further 
comprise a svibstituting means which, when each of a plurality of 
commands is constructed by a kind of command and an operand, 



executes encoding for gradually increasing a code length in accordance 
with the kind of each command. As second occasion, the data 
compressing apparatus may further comprise a substituting means 
which, when each of a plurality of commands is constructed by a kind of 
command and an operand, executes encoding for gradually increasing 
the code length in accordance with a parameter of each operand. 

The data compressing apparatus may further comprise 
substituting means which executes, every predetermined data unit, a 
process for, when a data string designated by a specific pointer 
coincides with data strings designated by a plurality of other pointers, 
replacing the specific pointer with a definition command and replacing 
the plurality of other pointers with the code substitution command 
corresponding to the definition command. 

In the data compressing apparatus, among the data strings 
existing in the predetermined data unit, the data string which appears 
first is selected as a data string designated by the specific pointer. 

In the case, the data compressing apparatus may further 
comprise a substituting means which, when a plurality of definition 
commands are set, counts frequencies of appearance, in the 
predetermined data unit, of a plurality of data strings designated by 
the pointer replaced with the plurality of definition commands and 
forms a definition table in which the definition commands have been 
disclosed in order of the frequencies of appearance. 

In the data compressing apparatus, the definition table is 
newly formed every predetermined data unit. 

The data compressing apparatus may further comprise 
substituting means which discriminates whether a data amount of the 
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data string designated by the specific pointer coincides with a data 
amount of data string designated by the other pointer or not, 
discriminates whether a difference between an address of the data 
string designated by the specific pointer in the predetermined data unit 
and an address of the data string designated by the other pointer 
coincides with a difference between an offset value which the specific 
pointer has and an offset value which the other pointer has or not, and 
if they coincide, determines that the data string designated by the 
specific pointer coincides with the data string designated by the other 
pointer. 

According to the present invention, there is provided a data 
decoding apparatus, wherein when data including raw data, a pointer, 
and a command is inputted, 

the command is executed, the data is returned to the raw 
data or the pointer, and the raw data or the pointer is returned to a 
target data string. 

Further, according to the present invention, there is 
provided a data decoding apparatus, wherein 

when data which has been compressed every predetermined 
data unit is given in a manner such that when a data string which is 
data including raw data, a pointer, and a command and is designated by 
a specific pointer coincides with data strings designated by a plurality 
of other pointers, the specific pointer is replaced with a definition 
command and the plurality of other pointers are replaced with a code 
substitution command corresponding to the definition command, the 
code substitution command is returned to the definition command, the 
definition command is returned to the raw data or the pointer, and the 



raw data or the pointer is returned to the data string as a target every 
the predetermined data unit. 

The above and other objects and features of the present 
invention will become apparent from the following detailed description 
and the appended claims with reference to the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a constructional diagram of a data compressing 
apparatus in the embodiment 1 of the invention; 

Fig : 2 is an explanatory diagram of encoding of the LZSS; 

Fig. 3 is an explanatory diagram of the encoding which is 
used in the LZSS or the like; 

Fig. 4 is an explanatory diagram showing a fundamental 
construction of a command in the embodiment 1; 

Fig. 5 is an explanatory diagram showing an example of 
creation of the command in the embodiment 1; 

Fig. 6 is a flowchart showing the operation of the data 
compressing apparatus in the embodiment 1; 

Fig. 7 is an explanatory diagram of substitution to the 

command; 

Fig. 8 is an explanatory diagram of output data in the 
embodiment 1; 

Fig. 9 is a flowchart showing the operation of a data 
decoding apparatus in the embodiment 1; 

Fig. 10 is an explanatory diagram of encoding of the 
embodiment 2; 

Fig. 11 is an explanatory diagram showing a construction of 



a command in the embodiment 2; 

Fig. 12 is an explanatory diagram of an example of creation 
of the command in the embodiment 2; 

Fig. 13 is an explanatory diagram showing an example of 
creation of a command in the embodiment 3; 

Fig. 14 is an explanatory diagram showing a relation 
between a frequency of appearance of the same character string and the 
encoding in a certain input buffer; 

Fig. 15 is an explanatory diagram showing discriminating 
means for discriminating whether the pointer is the same pointer or 
not; 

Fig. 16 is an explanatory diagram showing an example of 
creation of a definition table of character strings and the like; 

Fig. 17 is an explanatory diagram showing an example of 
encoding in the embodiment 3; 

Fig. 18 is a flowchart showing the operation of the 
embodiment 3; and 

Fig. 19 is an explanatory diagram of setting of a code 
substitution command. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Embodiments of the invention will be described in detail 

hereinbelow with reference to the drawings. 

First, a principle of the invention will be explained prior to 

describing the embodiments. 

To solve the foregoing problems, the invention provides a 

data compressing apparatus in which in addition to character data and 

a pointer, a command is newly added to an output data string and the 



character data or the pointer and, further, the command itself are 
controlled, thereby realizing the addition of a new function and the 
improvement of a compression ratio as a whole. 

As mentioned above, the data compressing apparatus of the 
invention has a feature such that in addition to the character data anc 
the pointer, the command is newly added to the compressed data. 
Although the command can be variably defined, it is fundamentally 
used to output the character data. For example, according to a pointei 
repetition command, the relevant pointer is repeated a plurality of 
number of times, so that the character data as an output of the pointei 
is repetitively outputted a plurality of number of times. If the numbei 
of repeating times is small and the number of character data which is 
outputted at a time is small, a data amount increases due to the 
addition of the above command, so that there is a case where the 
compression ratio decreases. According to the invention, a mode in 
which the command is not added (refer to the operation in step S4 in 
the embodiment 1, which will be explained hereinafter) is selected in 
such a case. Therefore, in principle, according to the invention, the 
compression ratio is not reduced to a value lower than that of the 
conventional LZSS or the like. There is an exceptional case where the 
decrease in compression ratio is permitted in order to embed copyright 
information or the like. It should be noted that even in the case where 
the command is added or not added, the decoded data output is not 
changed at all. The embedded copyright information or the like does 
not directly change the data output at all. It should be always noted 
in the case of expanding the conventional compression method. 

The above principle in the data compression intends to 
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establish a new viewpoint that the data compression changes the 
sequential character string to a "program" by stepping up from a direct 
viewpoint that the data compression changes the sequential character 
string to the sequential character data and pointer like a conventional 
method. The first step in such a viewpoint will be easily understood 
by considering that the pointer is a kind of command. A pointer (B, L) 
in the compression method such as LZSS or the like denotes that the 
character string of L characters is outputted from a position where the 
character string is traced back by B characters as will be explained 
hereinlater. It is really a kind of command. In the field of 
semiconductors, a method of describing a logic circuit by a program has 
been spread and a language such as Velilog or the like has been known. 
According to such a language, a circuit structure is outputted as a 
program. In the invention, a data structure is outputted as a program. 

Fig. 1 shows a construction in which an additional encoder 
(corresponding to substituting means 2 in the diagram) is introduced 
when the data compressing apparatus of the invention is embodied. 
Details of this construction will be explained hereinafter. According to 
the data compressing apparatus shown in the diagram, input data 11 
(character string) in an input buffer is converted into sequential 
character data and pointer by a compressing apparatus 1 corresponding 
to the conventional compressing apparatus and outputted to an 
intermediate buffer and, thereafter, they are further converted into 
output data 13 including the sequential character data and pointer by 
the additional encoder (substituting means 2) introduced in the 
invention and outputted to an output buffer. The conventional 
compressing apparatus 1 can be also considered by including it into the 



additional encoder introduced in the invention. The output of the 
sequential character data, pointer, and command is really nothing but 
the creation of the program. 

Encoding of the LZSS will now be described as an example 
of the data compression method of outputting the character data and 
pointer from a character string. 

Fig. 2 is an explanatory diagram of the encoding of the 

LZSS. 

Fig. 2 shows an example in which a character string 
"AABBCBBAABC" is encoded by 2 bytes of a pointer length, in such a 
case, the encoding is executed in the LZSS as follows. 

(1) S ince the first character "A" of the character string is the first 
character and is not a target of compression, the raw data "A" is 
outputted as character data. 

(2) Although the second character "A" of the character string 
coincides with the first character "A" in (1) which has already appeared, 
since the pointer length is equal to 2 bytes and coincidence of 2 bytes is 
not be obtained, the raw data "A" is also outputted as it is as character 
data in this case. 

(3) The third character "B" of the character string is a character 
which newly appeared and is handled in a manner similar to (1). 

(4) The fourth character "B" of the character string is a character 
which appeared just before and is handled in a manner similar to (2). 

(5) The fifth character "C" of the character string is a character 
which newly appeared and is handled in a manner similar to (1). 

(6) Although the sixth character "B" of the character string is a 
character which appeared before, when it is combined with the 
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subsequent seventh character "B", the combined characters coincide 
with "BB". In this case, since the characters coincide with respect to 2 
or more bytes of the pointer length and a character string "BBA" 
combined with the eighth character "A" of the character string does not 
appear yet, a pointer (3, 2) is outputted. The pointer (B, L) denotes 
that the character string of L characters is outputted from a position 
where the character string is traced back by B characters. 

(7) Since a character string "AAB" starting with the ninth 
character of the character string coincides with the three character 
strings starting from the first character and its length is equal to or 
longer than the pointer length of 2 bytes, a pointer (7, 3) is outputted. 

(8) Although the 11th character "C" of the character string is a 
character which appeared before, since it is shorter than the pointer 
length, it is handled in a manner similar to (2). 

Fig. 3 is an explanatory diagram of the encoding which is 
used in the LZSS or the like. 

The encoding intends to assign a code of a short code length 
to a pointer of a short data length whose frequency of appearance is 
high. 

In Fig. 3, output data (Compressed Stream) 101 is 
constructed by a compressed data string (Compressed String) 102 and 
an end marker (End Marker) 103. The Compressed String 102 is a set 
of (the head bit is equal to 0) + raw data (Raw Byte) 104 or a set of (the 
head bit is equal to 1) + a compression code (Compressed Bytes) 105. 
"The head bit" denotes the specific bit (extra ID-bit) mentioned above. 

Although the Raw Byte 104 is constructed by 8 bits (1 byte) 
like ASCII, the Compressed Bytes 105 is constructed by an offset 



(Offset) 106 and a code length (Length) 107. That is, the Offset and 
the Length correspond to B and L of the pointer (B, L) mentioned above. 
As an Offset 106, a 7-bit offset (8 bits in total) whose head bit is equal 
to 1 or a 11-bit offset (12 bits in total) whose head bit is equal to 0 is 
prepared in consideration of encoding efficiency. That is, when a value 
of the offset is small, the 7-bit offset is used and if it cannot be 
expressed by the 7-bit offset, the 11-bit offset is used. The 11-bit offset 
is based on 2 kbytes as a capacity of the input buffer. 

The Length 107 indicates the foregoing entropy methods of 
assigning the code of the short code length to the pointer of the short 
data length whose frequency of appearance is high. The End Marker 
103 is assigned to a 9-bit code 0x180 whose offset is equal to 0. 0x180 
indicates "180" as a hexadecimal notation. In the case of the binary 
notation, 0x180 indicates "110000000" as shown in the diagram. 

Although the invention mainly relates to the compression of 
a character string, it will be obviously understood from the foregoing 
code construction that the input data is not limited to the character 
string so long as one unit consists of 8 bits. 

It is an object of the invention to improve the compression 
method such as LZSS or the like so as to enable a method for free 
compression having a more excellent compression ratio to be realized. 
It is important to prevent such a new method from losing the advantage 
of the conventional compression method. For this purpose, upon 
expansion of the LZSS or the like, particularly, upon addition of a 
command, according to the invention, the conventional method is 
expanded with respect to the encoding of the command. Specifically 
speaking, the encoding method shown in Fig. 3 is expanded. 
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Although it appears that the encoding which takes into 
consideration of the entropy methods does not have room of expansion, 
an End Marker (103 in Fig. 3) as a unique special command is used. In 
the conventional encoding, although the end marker is assigned to the 
9-bit code 0x180 whose offset is equal to 0, it is expanded and used as a 
command in the embodiment 1 of the invention. 
« Embodiment 1 » 
<Construction> 

Fig. 1 is a constructional diagram of the data compressing 
apparatus in an embodiment 1 of the invention. 

The apparatus shown in the diagram comprises the 
compressing apparatus 1 and the substituting means 2. The 
compressing apparatus 1 is the conventional compressing apparatus 
such as LZSS or the like and outputs intermediate data (character data 
and a pointer) 12 from the input data 11. The substituting means 2 is 
the additional encoder in the embodiment and has a function for adding 
a command to the intermediate data outputted from the compressing 
apparatus 1 and outputting output data (character data, pointer, 
command) 13. 

Fig. 4 is an explanatory diagram showing a fundamental 
construction of the command in the embodiment 1. 

In the embodiment, as shown in Fig. 4(a), each command is 
constructed by an end marker 21 and an extension code 22. The end 
marker 21 is similar to the conventional end marker (code 
corresponding to the End Marker 103 in Fig. 3) and the 9-bit code 0x180 
whose offset is equal to 0 is used as it is. This code is nothing but a 
code used in accordance with the general code method shown in Fig. 3 



and is not limited to it. When selecting any code, it is necessary to 
select a code which does riot infringe on the codes assigned to the 
character data and pointer upon encoding. 

According to the conventional data decoding method, when 
the compressed character data or the like is decoded and the end 
marker is detected, the decoder stops the decoding. However, a 
decoder in the embodiment interprets the end marker as an extension 
command and executes such a command. In the embodiment, as 
shown in Fig. 4(b), the extension code 22 is constructed by a kind 22a of 
command and an operand 22b. Naturally, the kind 22a of command 
has to include the conventional end marker (corresponding to the End 
Marker 103 in Fig. 3). If many commands are necessary, the invention 
copes with such a case by increasing the number of bits of the extension 
code 22. The operand 22b is a portion to designate a parameter 
corresponding to the kind 22a of command and shown as an argument 
for the command. 

Fig. 5 is an explanatory diagram showing an example of 
creation of the command. 

In the embodiment, 4 bits are assigned to the kind of 
command, thereby enabling 16 kinds of commands to be formed. 

First, when the kind of command is equal to 0x0, it 
indicates an END command and denotes the end of output data in a 
manner similar to the conventional end marker. This command 
consists of 13 bits as a whole. 

When the kind of command is equal to 0x1, it indicates an 
RD (Repeat Data) and is a data control command. RD(B, L, N) has 
contents such that L character strings from a position before B strings 



are repeated N times. Since 7 bits are assigned to a parameter B, 8 
bits are assigned to a parameter L, and 4 bits are assigned to a 
parameter N, the command consists of 32 bits as a whole. It is 
assumed that "before B strings 1 ' denotes that not only the character 
data but also the pointers and commands are counted in the command. 
Naturally, a command to count only the characters can be also formed. 
When a pointer and a command other than characters are included in L 
data from a position before B data, a decoder error occurs so long as 
such a command. Assuming that the conventional pointer is a kind of 
command, an RD (Repeat Data) command has a function in which this 
command is expanded. 

When the kind of command is equal to 0x2, it indicates an 
RP (Repeat Pointers) command and is a pointer control command. 
RP(B, N) has contents such that the pointer from a position before B 
pointers is repeated N times. Since 7 bits are assigned to a parameter 
B and 4 bits are assigned to a parameter N, the command consists of 24 
bits as a whole, "repeated N times" denotes that the character data 
which is outputted by such a pointer is repetitively outputted N times. 
Such a using method is similar to a method whereby the pointer itself is 
regarded as a word and this word is repetitively outputted. According 
to such a viewpoint, since a pointer group which gives different outputs 
corresponds to a dictionary in which words are collected, it is sufficient 
that one word is disclosed in the dictionary. If the data from a position 
before B data is other than the pointer, a decoder error occurs as long as 
such a command. 

When the kind of command is equal to 0x3, it indicates an 
RDP (Repeat Data and Pointers) command and is a control command of 



the character data and the pointer. RDP(B, L, N) has contents such 
that L data strings from a position before B strings are repeated N 
times. Since 7 bits are assigned to the parameter B, 8 bits are 
assigned to a parameter L, and 4 bits are assigned to the parameter N, 
the command consists of 32 bits as a whole. Similarly, it is assumed 
that "before B data strings" denotes that not only the character data 
but also the pointers and commands are counted in those strings. Also 
in this case, when "before B data strings" is used, a command to count 
only the characters can be also formed. If a command to output the 
character data or the like exists in the counted commands, the decoder 
in the embodiment also has to decode its output. Although the 
encoding is easily performed, a processing burden increases upon 
decoding. 

When the kind of command is equal to 0x4, it indicates an 
OMD (Output Modified Data) command and is a data control command. 
OMD(B, L, M, C) has contents such that the Mth character data among 
the L data strings from a position before B strings is changed to C and 
outputted. Since 7 bits are assigned to a parameter B, 8 bits are 
assigned to a parameter L, further, 7 bits are assigned to a parameter 
M, and 8 bits are assigned to a parameter C, the command consists of 
43 bits as a whole. The meaning of the existence of this command is 
that when similar character strings are inputted, if only one character 
differs between them, the process becomes simpler by correcting the 
character string by this command. If the data amount increases due to 
the use of such a command, the command is not used in the 
embodiment. 

When the kind of command is equal to 0x5, it indicates a CP 



(Connect Pointers) command and is a pointer control command. CP(B1, 
B2) has contents such that a character string shown by the pointer from 
a position before Bl pointers and a character string shown by the 
pointer from a position before B2 pointers are coupled and outputted. 
Since 7 bits are assigned to each of parameters Bl and B2, the 
command consists of 27 bits as a whole. The meaning of the existence 
of this command is that the total number of bits is reduced to a value 
lower than that in the case of continuously outputting two pointers. 
However, a problem such that the pointer after the deletion cannot be 
directly used by another pointer control command occurs. In this case, 
if a command to decode the command itself is used in place of the 
pointer although the encoder is burdened, such a pointer can be 
indirectly used. However, if the pointer is the second or subsequent 
pointer indicative of the same character string, such a problem does not 
occur. If the data amount increases due to the use of such a command,, 
the command is not used in the embodiment. A decoder error occurs if 
the indicated data is other than the pointer. 

When the kind of command is equal to OxD, it indicates an 
SPW (Set Password) command and is a password setting command. 
The SPW command embeds a password to the operand subsequent 
thereto. In the embodiment, since 8 bytes are assigned to the 
password, the command consists of 77 bits as a whole. The password 
is verified first by an application program (not shown). If the 
password is not authenticated, the decoding by the decoder is inhibited. 
It is necessary to construct in a manner such that even if the data 
amount increases due to the insertion of this command, the 
substituting means 2 does not have authority to delete such a 



command. 

When the kind of command is equal to OxE, it indicates an 
SCR (Set Copyright) command and is a copyright information setting 
command. The SCR command embeds copyright information into the 
operand subsequent thereto. In the embodiment, since 8 bytes are 
assigned to the copyright information, the command consists of 77 bits 
as a whole. Since it is sufficient that the copyright information can 
specify only the copyrighter of the copyright, the number of bytes 
should not wastefully be increased or the whole data amount should not 
be increased by including many copyright information. An application 
program needs to have a sequence for allowing the copyrighter to 
consent and confirm. 

When the kind of command is equal to OxF, it indicates an 
SCM (Set Comment) command and is a comment setting command. 
The SCM command embeds a comment into the operand subsequent 
thereto. In the embodiment, since 8 bytes are assigned to the 
comment, the command consists of 77 bits as a whole. It is also 
necessary to pay attention to an increase in data amount. It is 
desirable that the application program has means for displaying the 
comment. If a long comment of 256 bytes is permitted to the SCM 
command, data compression can be further accomplished to such a 
comment. In such a case, a comment termination (Comment 
Termination) command CT showing termination of the comment is used. 
When the substituting means 2 in the embodiment detects the SCM 
command, subsequent character string data is handled as a comment. 
When the CT command is detected, the termination of the comment is 
known. 
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The data decoding apparatus of the embodiment is an 
apparatus for decoding the output data 13 compressed by the data 
compressing apparatus. That is, it is the data decoding apparatus 
having a construction characterized in that when data including the 
raw data, pointer, and command is inputted, the command is executed 
and the data is returned to the raw data or the pointer, and returned to 
the data string regarding the raw data or the pointer as a target. 
«Operation>> 

Fig. 6 is a flowchart showing the operation of the data 
compressing apparatus of the invention. 

First, the compressing apparatus 1 compresses the input 
data 11 and outputs the character data and the pointer as intermediate 
data 12. The character data and the pointer are inputted to the 
substituting means 2 (step Si). Thus, the substituting means 2 
discriminates whether the command can be substituted for the 
character data and the pointer or the operator desires to add a special 
command (step S2). That is, whether various commands as shown in 
Fig. 5 can be substituted or they are substituted is discriminated. 

If "Y (Yes)" in step S2, whether the command is 
accompanied with the output of the character data or not is 
discriminated (step S3). If the command is not accompanied with the 
output of the character data ("N (No)" in step S3), the command is 
merely added and outputted (step S5). If the command is accompanied 
with the output of the character data ("Y" in step S3), whether the data 
amount is decreased due to it or not is discriminated (step S4). This 
discrimination is made by comparing the data amount in the case where 
the command has been substituted with that in the case where it is not 
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substituted and checking whether the data amount in the case where 
the command has been substituted is smaller than that in the case 
where it is not substituted. In step S4, if it is determined that the 
data amount decreases, step S5 follows and the substitution by the 
command is made. Thus, the output data 13 in which the character 
data, the pointer, and the command exist mixedly is outputted from the 
substituting means 2. 

It is assumed that in the case of embedding the copyright 
information, the substituting means 2 dose not consider about the 
reduction in data amount. 

Fig. 7 is an explanatory diagram of the substitution to a 

command. 

First, Fig. 7(1) shows the case of substituting pointers 
showing the same character string. When the substituting means 2 
detects that a first pointer PI and a second pointer P2 indicate the 
same character string C, it tries to replace the latter pointer P2 with 
the command RP(B, L) in which the former pointer Pi is repeated. 
The substituting means 2 calculates the sum of code lengths of both 
pointers and performs the substitution if it is determined that the data 
amount is reduced. If there is a command to operate the pointer P2 
replaced by the substitution, this command is limited. Therefore, in 
the substituting process in the embodiment, in principle, the output 
data of the LZSS or the like has to be replaced in order from the head. 
Such a discrimination about the substitution is made by first checking 
whether the pointers are the pointers having the same character string 
or not and, thereafter, checking coincidence of the character string. 

Subsequently, Fig. 7(2) shows the case of substituting two 



adjacent pointers. The two adjacent pointers Pi and P2 are replaced 
with one command CP(Bl, LI, B2, L2) to integrate the two pointers. 
The substituting means 2 checks the output data of the LZSS or the like 
in order from the head. When the two adjacent pointers are found, 
whether they can be replaced with the command CP defined in the 
embodiment or not is discriminated and only when it is decided that the 
data amount is reduced, the substitution is made. The substitution is 
made by deleting the codes of the two adjacent pointers and inserting a 
code corresponding to the command CP. However, it is desirable to 
apply such a process to the second and subsequent pointers in 
consideration of other commands or the like using those pointers. 

Further, Fig. 7(3) shows the case of substituting a command 
for a pointer indicative of a character string in which one character 
differs. When the substituting means 2 detects the pointer PI 
showing a character string Cl and the pointer P2 showing a character 
string C2 in which one character differs, the substituting means tries to 
replace the latter pointer P2 with the command OMD(B, L, M, C) to 
change one character in the character string Cl of the former pointer 
PI. After the substituting means 2 detected the pointer Pi showing 
the character string Cl, if the character string in which one character 
differs from that in the character string Cl exists and is constructed by 
a plurality of character data and pointers, the substituting means also 
tries to replace such a character string with the command OMD(B, L, M, 
C) to change such one character. The substituting means 2 calculates 
the sum of code lengths of both pointers and performs the substitution 
if it is determined that the data amount is reduced. Such substitution 
is effective when similar character strings often appear, and can 



contribute to reduction in data amount. 

Fig. 8 is an explanatory diagram of output data in the 
embodiment. 

It is assumed that character strings of Fig. 8(a) exist as raw 
data in the input buffer. The raw data is data in which "ABC" is 
repeated six times and, subsequently, a character string "D..." follows. 
The total number of bits of the repetitive portion is equal to 144 bits 
because one character is constructed by one byte (8 bits). 

Fig. 8(b) shows a result in which the data of (a) is encoded 
by the LZSS and corresponds to the output data of the compressing 
apparatus 1 in Fig. 1. The first character data string "ABC" consists 
of 27 bits ((8 bits as the number of bits of the character + 1 bit as a 
specific bit) x 3) and the number of bits is increased by 1 bit per 
character due to the encoding. The next output is a pointer "(3, 3)" 
and since an offset is equal to 3 and a length is equal to 3, this pointer 
consists of 11 bits (1 bit as a head bit showing Compressed Bytes 105 in 
Fig. 3 + 8 bits as the number of bits for the Offset 106 + 2 bits showing 
the Length 107). Similarly, each of the next two pointers "(6, 6) (12, 
6)" consists of 13 bits, so that the total number of bits of the output data 
is equal to 64 bits. Therefore, the compression of about 44% is 
accomplished as compared with the original data. 

Fig. 8(c) shows the output data in the embodiment 1. 
Although the character data string "ABC" and the pointer "(3, 3)" are 
processed in a manner similar to the case of the LZSS, the next output 
data "RP(1, 4)" differs. As shown in the definition table of Fig. 5, this 
command is a pointer control command and is a command in which a 
one-preceding pointer is repeated four times. Since the "one-preceding 
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pointer" is "(3, 3)" and its output data is the character data string 
"ABC", the output of the character data string "ABC" is repeated four 
times eventually and the total number of repeating times is equal to 5 
times. Naturally, the output data has code contents which coincide 
with the input character string shown in (a). In the embodiment 1, 
since the number of bits has been assigned as shown in Fig. 5, the 
pointer control command can be constructed by 24 bits. Thus, the 
output data consists of 62 bits in total and this means that the 
compression of about 43% is accomplished as compared with the 
original data. 

Fig. 8(d) shows output data using another command 
according to the embodiment 1. Although the character data string 
"ABC" is processed in a manner similar to the case of the LZSS, the 
next output data "RD(3, 3, 5)" differs. As shown in the definition table 
of Fig. 5, this command is a data control command and is a command in 
which three character strings from a 3-preceding character are 
repeated five times. That is, the character data string "ABC" is 
outputted five times by this command. In the embodiment, since the 
number of bits has been assigned to each parameter as shown in Fig. 5, 
the data control command can be constructed by 32 bits. Eventually, 
the output data consists of 59 bits in total and this means that the 
compression of about 41% is accomplished as compared with the 
original data. 

As mentioned above, in the embodiment, the compression 
ratio can be further improved by a few % as compared with that by the 
data compression method.such as LZSS or the like with respect to the 
data as shown in Fig. 8. 



The operation of a data decoding apparatus (decoder) for 
decoding the data which has been compressed by the data compressing 
apparatus in the embodiment as mentioned above will now be 
described. 

Fig. 9 is a flowchart showing the operation of the data 
decoding apparatus. 

First, one of codes each consisting of character data, a 
pointer, and a command is inputted (step Sll). Whether the code is 
the character data or not is discriminated (step S12). If YES, relevant 
characters are outputted (step S13) and the processing routine 
advances to the next code. If the code is not the character data in step 
S12, whether the code is the pointer or not is discriminated (step S14). 
If YES, a character string showing the pointer is outputted (step S15) 
and the processing routine advances to the next code. If the code is 
not the pointer in step S14, whether the code is the command or not is 
discriminated (step S16). That is, whether the code is a code which 
starts with the end marker 21 or not is discriminated. In the case of 
the embodiment, it is possible to determine that the code which starts 
with the end marker is the command. 

In step S16, if the code is not the command ("N" in step S16), 
an error is displayed (step S17). The data decoding process is 
terminated at this point of time. That is, since the case where the code 
is none of the character data, the pointer, and the command does not 
exist, the data decoding process is stopped. 

If the code is the command in step S16 and the command is 
a data output command ("Y" in step S18), the command is interpreted 
and the relevant character string is outputted (step S19). The 



processing routine advances to the next code. If the kind of command 
is not the data output command in step S18 ("N" in step S18), whether 
the command is the end marker (END command in Fig. 5) or not is 
discriminated (step S20). If the command indicates the end marker in 
step S20, since the data as a target of the decoding is the last data, the 
data decoding process is terminated. If the command is not the end 
marker, since it is the password setting command SPW, copyright 
information setting command SCR, or comment setting command SCM, 
the corresponding process such as display of the copyright information 
or the like is executed (step S21) and the processing routine advances to 
the next code. 
<Effects> 

As mentioned above, according to the embodiment 1, when 
the character string as a target of the command coincides with the 
character string as a target of the character data and the character 
string indicated by the pointer or with a combination of them, it is 
replaced with the character data or the pointer and the command is 
outputted. Therefore, for example, various information such as 
copyright information and password can be also easily embedded and 
the free compression can be performed. The invention can be also 
contributed to the improvement of the compression ratio. 

In the embodiment 1, when the length of code assigned to 
the command is shorter than the length of data comprising the 
character data and the pointer, the character data or the pointer 
serving as a source is replaced with the command. Therefore, a 
phenomenon such that the compression ratio of the output data in the 
embodiment is reduced to a value lower than that of the output data 



obtained by the conventional compression method such as LZSS or the 
like can be prevented. 
«Embodiment 2» 

According to the embodiment 2, the command is identified 
by a specific bit string. 
<Construction> 

Since a construction of a data compressing apparatus in the 
embodiment 2 is similar to that of Fig. 1, its explanation is omitted here. 
A construction of encoding by the substituting means 2 in the 
embodiment 2 differs from that of the embodiment 1. 

Fig. 10 is an explanatory diagram of the encoding of the 
embodiment 2. 

First, output data (Compressed Stream) 201 is constructed 
by a compressed data string (Compressed String) 202 and an end 
marker (End Marker) 203. The Compressed String 202 is a set of (the 
head bit is equal to 0) + raw data (Raw Byte) 204 or a set of (the head 
bit is equal to 1) + a command (Command/Pointer) 205. "The head bit" 
denotes the specific bit (extra ID-bit) mentioned above. Although the 
Raw Byte 204 is constructed by 8 bits (Ibyte) like ASCII, in the 
command (Command/Pointer) 205, a pointer (Pointer) 207 is also 
considered to be a kind of command. The Pointer 207 whose head bit is 
equal to 0 and a command (Command) 206 whose head bit is equal to 1 
are identified. Such identification is introduced because the encoding 
which preferentially uses the command has been used in the 
embodiment. However, the code length becomes long by one bit due to 
such an identifying process. 

The Pointer 207 is constructed by an offset (Offset) 210 and 



a code length (Length) 211. As an Offset 210, a 6-bit offset (7 bits in 
total) whose head bit is equal to 1 or a 11-bit offset (12 bits in total) 
whose head bit is equal to 0 is prepared in consideration of the encoding 
efficiency. Although the 7-bit offset whose head bit is equal to 1 has 
been used in the encoding method in the embodiment 1, since the code 
length becomes long by one bit in order to identify the command and the 
pointer as mentioned above, the offset is shortened for the purpose of 
compensating it. The Length 211 indicates the entropy methods of 
assigning the code of the short code length to the pointer of the short 
data length whose frequency of appearance is high in a manner similar 
to the embodiment 1. 

The Command 206 in the embodiment 2 is constructed by a 
kind of command (Command Set) 208 and an operand (Operand) 209. 
In the embodiment, since 4 bits are assigned to the kind of command, 16 
commands can be designated. The Operand 209 is used to designate a 
parameter of the command. In the embodiment, the End Marker 203 
is one of the commands and is specifically defined by 6 bits of a code 
"110000". 

The embodiment 2 has a feature such that the short code is 
assigned to the command as compared with the embodiment 1. That is, 
if head two bits of the Compressed String 202 are equal to "11", it is 
determined to be the command. Although the number of bits of the 
Raw Byte 204 is equal to 9 bits similar to that in the embodiment 1, the 
Pointer 207 becomes long by 1 bit and significance consequently 
deteriorates. However, the embodiment 2 can contribute to the data 
compression owing to the short command in which the total number of 
bits is reduced. 



Fig. 11 is an explanatory diagram showing a construction of 
a command in the embodiment 2. 

The embodiment 2 differs from the construction of the 
embodiment 1 shown in Fig. 4 with respect to a point that the code of 
the command is shortened to a short command code "11" 31 from the 
end marker 21 in the embodiment 1. That is, in the embodiment 1, the 
code 0x180 has been assigned to the End Marker 103 in Fig. 3 and the 
operand such as a kind of command or the like has been added to the 
code. In the embodiment 2, the short command code "11" is assigned to 
the heads of all commands and, further, 4 bits and the number of bits 
corresponding to the parameter are assigned in accordance with the 
kind of command. This means that a status in preference to the 
pointer has been given to the "command" introduced in the invention. 
By using the command in the embodiment, the compression ratio which 
is fairly higher than that of the conventional compression method such 
as LZSS or the like having only the pointer can be accomplished. In 
the embodiment, an extension code 32 (a kind 32a of command + an 
operand 32b) is set as shown in Fig. 11(b) in a manner similar to (the 
kind 22a of command + the operand 22b) in Fig. 4. 

The kind of command can be formed in a manner similar to 
the definition table of Fig. 5 described in the embodiment 1. 

Fig. 12 is an explanatory diagram of an example of creation 
of the command in the embodiment 2. 

As shown in the diagram, the embodiment 2 differs from the 
embodiment 1 with respect to a point that the code "11" is assigned in 
place of the end marker 0x180 in the embodiment 1 and other points are 
similar to those in the embodiment 1. 
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<Operation> 

Since the data compressing operation in the data 
compressing apparatus in the embodiment 2 is similar to that in the 
embodiment 1 shown in Fig. 6, its explanation is omitted here. The 
operation in this example will be described also with reference to Fig. 8. 

First, if the head character string "ABC" in Fig. 8(c) is 
encoded as shown in Fig. 10, it is performed in a manner similar to the 
conventional encoding. Therefore, 9 bits per character, that is, 27 bits 
in total are necessary. Although the next pointer "(3, 3)" is the code 
"11000001101" in the embodiment 1, it is the code "10100001101" in the 
embodiment 2 and can be similarly constructed by 11 bits. This is 
because the shorter offset is set to 6 bits in the embodiment 2. The 
next output data "RP(1, 4)" is a command according to the embodiment 
2 and consists of 24 bits in the embodiment 1. However, according to 
the encoding in Fig. 10, it is "11001000000010100" and can be 
constructed by 17 bits. Therefore, the total number of bits is equal to 
55 bits. This value is equal to 38% in comparison with 144 bits before 
the compression and the compression ratio is fairly improved more than 
44% of the LZSS. 

Subsequently, when output data "RD(3, 3, 5)" in Fig. 8(d) is 
similarly encoded, it can be expressed by 25 bits in the case of the 
embodiment 2. The total number of bits is equal to 52 bits. This 
value is equal to 36% in comparison with 144 bits before the 
compression, so that the compression ratio is further improved more 
than 44% of the LZSS. 

Since the data decoding process differs from that in the 
embodiment 1 only with respect to a point the discrimination of the 



command is made on the basis of whether head 2 bits are equal to "11" 
or not and other operations are similar to those in the embodiment 1, 
its explanation is omitted here. 
<Effects> 

As mentioned above, according to the embodiment 2 since 
the first bit is used to distinguish the character data, the pointer, and 
the command and the second bit is used to distinguish the pointer and 
the command, it is sufficient that the code regarding the command is 
shorter than that in the embodiment 1 and the compression ratio can be 
more improved. 
«Embodiment 3» 

The embodiment 3 relates to a new encoding method called 
a Dynamic Code Assignment method. Such a method differs from the 
conventional compression methods with respect to a point that the 
codes assigned to the character string, command, and the like change 
dynamically as compared with the entropy coding methods. The 
shorter the character string is, the more it appears frequently in the 
entropy coding methods. Therefore, it is intended to assign the short 
code to such a short character string. This method is an encoding 
method based on what is called an experientialism. 

On the other hand, the Dynamic Code Assignment method 
in the embodiment 3 is a method whereby a character string which 
actually and frequently appears in the data is encoded every definition 
unit of a predetermined data amount such as an input buffer. Since 
the different encoding is executed every definition unit, a phenomenon 
such that the same code in the output data has different meanings 
occurs. According to this method, since the short code is assigned to 
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the character string which actually and frequently appears, the 

compression ratio can be more raised. 

<Construction> 

Since a construction on the diagram is similar to that in Fig. 
1 in the embodiment 1, it is not shown here. A data compressing 
apparatus in the embodiment 3 differs from those of the embodiments 1 
and 2 with respect to a point that the unit which is used when the 
substituting means 2 executes the encoding process is the definition 
unit of the predetermined data amount such as an input buffer unit. 
In the embodiment 3, the entropy coding methods are applied to the 
character string and commands. 

Fig. 13 is an explanatory diagram showing an example of 
creation of the command in the embodiment 3. 

A foundation of the embodiment 3 is data definition (Data 
Definition) commands and a code substitution (Code Substitution) 
command. As shown in Fig. 13, a character string definition command 
SD (String Definition), a character string definition output command 
SDO (String Definition and Out), and a pointer definition command PD 
(Pointer Definition) are provided as data definition commands. The 
SD(B, L, M) command defines L character strings from a position where 
the character string is traced back by B character strings as the "Mth" 
character string. As will be explained hereinlater, the n Mth" is 
generally set to the order of the frequency of appearance of the 
character string in the input buffer. The SDO(B, L, M) command 
defines L character strings from a position where the character string 
is traced back by B character strings as the "Mth" character string and 
outputs them. The meaning of the existence of this command is that 



since the definition of the character string and the function of the 
pointer are simultaneously accomplished by one code, the code amount 
can be saved. The PD(M) command defines the pointer just before as 
the "Mth" pointer. Similarly, "the Mth" is generally set to the order of 
the frequency of appearance of the character string. A CS(M) 
command is shown as a code substitution command. The CS(M) 
command is a command to substitute a short code for the Mth character 
string or the like designated by the data definition command. 

It is a feature of the embodiment that if the code length of 
the code substitution command CS(M) itself is set to be very short and 
the character string, the command, or the like is replaced with the code 
in accordance with the order of the frequency of appearance in the 
output buffer, the high compression ratio can be accomplished as a 
whole. By making such definition every input buffer, the definition to 
which the local data structure of the input data is reflected is made. 
Therefore, the more proper compression can be made as compared with 
the conventional compression method of performing the fixed encoding 
to the whole input data. 

Fig. 14 is an explanatory diagram showing a relation 
between the frequency of appearance of the same character string and 
the encoding in a certain input buffer. 

In the case of applying the Dynamic Code Assignment 
method in the embodiment 3, the character string which is designated 
by the data definition command is limited by the number of bits of a 
parameter M constructing the command. In the embodiment, M is 
equal to 5 bits and 32 character strings can be designated. In the 
embodiments 1 and 2, the input buffer is equal to 2 kbytes (value which 
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is set because the offset is equal to maximum 11 bits). It is also 
assumed in the embodiment 3 that the input buffer as a definition unit 
is equal to 2 kbytes. In such an input buffer, in order to optimize the 
number of bits of the parameter M, it is necessary to experimentally 
obtain the value. It is set to the value of 5 bits in the embodiment 3. 

As shown in Fig. 14(a), even if a frequency of appearance 
(T) is high, the maximum number of character strings which can be 
defined is set to 32 due to the limitation of the number of bits of the 
parameter M. Therefore, the definition or the like of the character 
string having the frequency of appearance whose order M is equal to or 
less than 32 is not performed and the definition is stopped on the basis 
of the number of bits of the parameter M as limitation. Fig. 14(b) 
shows an example in which the definition is stopped on the basis of the 
frequency of appearance (T). It is similar to the reason in which the 
length of matching (Match) is ordinarily set to 2 bytes in the LZSS. 
That is, this is because even if the character string whose frequency of 
appearance (T) is low is defined and the substitution is performed, a 
degree of contribution to the improvement of the compression ratio is 
small. Therefore, in the example shown in the diagrams, when the 
frequency of appearance (T) is equal to or less than 2, the definition or 
the like is not performed. Although a plurality of pointers which 
designate the same character string in a certain input buffer have 
different offset values, in the example shown in Fig. 14, the frequency 
of appearance is counted as a same pointer which designates the same 
character string. 

Discrimination about whether the pointer is the same 
pointer or not is determined from a relation between an address of the 
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character string as a target at present in the input buffer and the offset 
value. 

Fig. 15 is an explanatory diagram showing the 
discriminating means for discriminating whether the pointer is the 
same pointer or not. 

In the example shown in the diagram, it shows a case of 
discriminating whether the pointer P2 is the same as the pointer Pi or 
not. The first discriminating condition is that the character strings 
shown by the two pointers Pi and P2 show the same number of 
characters and this value coincides with the number of characters of a 
character string Si. The second discriminating condition is that a 
diffe rence between relative addresses Padr and Cadr which the two 
pointers PI and P2 have in the input buffer coincides with a difference 
between offset values OffsetadrP and OffsetadrC which the two 
pointers PI and P2 have. In order to show that the pointer as an 
examination target at present is P2, a character "C (Current)" is used. 

Fig. 16 is an explanatory diagram showing an example of 
creation of a definition table of character strings and the like. 

The substituting means 2 forms the definition table every 
input buffer. The definition command which defines the character 
string existing in the designated order M is listed together with its 
parameter. The definition command of each designated order shows a 
different character string or pointer. For example, the definition 
command SDO(B, L, M) of the designated order 1 and the definition 
command SDO(B, L, M) of the designated order 4 show the different 
character strings. A valid/invalid flag is a flag showing whether the 
corresponding definition is valid or invalid. In the embodiment, the 



definition commands of the designated orders down to 16 are the valid 
definition (value is equal to 1) in accordance with Fig. 11(b) in the 
embodiment 2. The definition table can be provided in a memory 
which can be used by an application program. In this case, if the 
processed codes are stored so that the designated order M does not 
overlap, the memory amount can be reduced. 

Fig. 17 is an explanatory diagram showing an example of 
the encoding in the embodiment 3. 

The encoding method in this embodiment can be compared 
with that in the embodiment 2 in Fig. 10. 

Output data (Compressed Stream) 301 is constructed by a 
compressed data string (Compressed String) 302 and an end marker 
(End Marker) 303. The Compressed String 302 is a set of (the head bit 
is equal to 0) + raw data (Raw Byte) 304 or a set of (the head bit is equal 
to 1) + a command (Command/Pointer) 305. Although the Raw Byte 
304 is constructed by 8 bits (1 byte) like ASCII, in the 
Command/Pointer 305, a pointer (Pointer) 307 is also considered to be a 
kind of command and the pointer (Pointer) 307 whose head bit is equal 
to 0, and a command (Command) 306 whose head bit is equal to 1 are 
identified. 

The pointer (Pointer) 307 is constructed by an offset 
(Offset) 310 and a code length (Length) 311. As an Offset 310, a 6-bit 
offset (7 bits in total) whose head bit is equal to 1 or a 11-bit offset (12 
bits in total) whose head bit is equal to 0 is prepared in consideration of 
the encoding efficiency. The Length 311 is the entropy methods of 
assigning a code of a short code length to the pointer of a short data 
length in which a frequency of appearance is high in a manner similar 
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to the embodiments 1 and 2. 

The embodiment is mainly characterized by the encoding of 
a command. The Command 306 in the embodiment is constructed by a 
kind of command (Command Set) 308 and an operand (Operand) 309. 
When the kind of command is encoded, the entropy methods are used. 
This is because it is considered that use frequencies of commands are 
varied remarkably. For example, a code "00" is assigned to the CS 
command (314 in the Command Set 312 in the diagram). This is 
because it is considered that such a code appears most frequently for 
the substitution for the pointer or the like. Subsequently, a code "01" 
is assigned to the SDO command (315 in the diagram). Since the 
definition command is ordinarily used up to the maximum value of the 
designated order M, its frequency of appearance is high. Subsequently, 
a code "10" is assigned to the PD command (316 in the diagram). A 
possibility that this command is used as a substitution for the pointer 
is high. If all commands frequently appear, the method for simply 
assigning 4 bits in the embodiment 2 shown in Fig. 10 can be also used. 

In the embodiment 3, 22 kinds of commands are defined and 
a code "11111111" is assigned to the END command. The entropy 
methods can be also used with respect to the Operand 309 (313 in the 
diagram). It is desirable that the higher the parameter M assigned to 
the designated order is, the shorter the code which is set (shown in 318 
in the diagram). The designated orders down to 22 are assured here. 
The entropy methods can be used with respect to the other parameters. 
The method used in this case is slightly different from the entropy 
methods applied when the number of characters of the pointer is 
encoded. This is because, the shortest length is equal to 2 in the LZSS 



since the pointer length is set to 2 bytes and there is no need add such a 
limitation to the command code or parameter in the embodiment. 
<Operation> 

Fig. 18 is a flowchart showing the operation of the 
substituting means 2 in the embodiment 3. 

First, when a processing routine is started and the data in 
the input buffer is updated (step S31), the definition table is initialized 
(step S32). The initialization of the definition table is made by 
invalidating the valid/invalid flag shown in Fig. 16 (a value 0 is 
written). Subsequently, the raw data in the input buffer is compressed 
by the compressing apparatus 1 (step S33). Character data or pointer 
is outputted as an intermediate compression output (step S34). That 
is, the compressing process in steps S33 and S34 corresponds to the 
compressing process according to the conventional LZSS. 

Subsequently, the substituting means 2 writes data into the 
definition table (step S35). That is, the frequency of appearance of the 
character string in the target input buffer is obtained and the data is 
written into the definition table in a manner such that the higher the 
frequency of appearance of the character string is, the more the 
designated order (M) is raised. That is, in Fig. 16, the code of the 
definition command corresponding to the character string of each 
designated order (M) is written and the valid/invalid flag is validated (a 
value 1 is written). The substituting means 2 executes various 
compressing processes (step S36). The various compressing processes 
are, for example, a substituting process of the pointers as shown in Fig. 
7, the embedding process of the copyright information or the like, and 
further, the setting of the code substitution command. 
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Fig. 19 is an explanatory diagram of the setting of the code 
substitution command. 

Fig. 19(a) shows a case of handling only one pointer. There 

are ... PI ... Pi .. Pi PI ..." as character data or the like as 

intermediate data 12 and ". . . . SDO . . . CS . . CS CS . . ." is 

outputted as output data 13 by using the substituting means 2. In the 
embodiment, the first pointer "PI" is replaced with the character string 
definition output command SDO (refer to e in the diagram). If the 
pointer definition command PD is used, one pointer is duplicated. 
However, in this case, since the command to operate such a pointer can 
be used, it is necessary to discriminate on the basis of the final 
compression ratio as a reference. The second and subsequent pointers 
are replaced with the code substitution commands CS (refer to g to V in 
the diagram). Although the pointers Pi have different offsets, they 
are handled as a pointer indicative of the same character string. 

Fig. 19(b) shows the case of handling a plurality of pointers. 
The first pointers "Pi" and "P2 n in the intermediate data 12 are 
replaced with the character string definition output commands "SDOl" 
and "SD02" in the output data 13, respectively. The subsequent 
pointers "PI" and "P2" are replaced with code substitution commands 
"CSl" and "CS2 n , respectively. In the case of handling a plurality of 
pointers, each pointer can be also independently defined and replaced. 

Returning to Fig. 18, whether all the data has been 
processed or not is discriminated (step S37). If the data in the input 
buffer to be processed exists, the processes from step S31 mentioned 
above are repeated. If all the data has been processed, the data 
compressing process is finished. 
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How the decoder handles the output data compressed by the 
data compressing apparatus in the embodiment will now be described. 

Since the operation of the decoder is fundamentally similar 
to that of Fig. 9, it will be explained with reference to Fig. 9. The 
embodiment 3 differs from it with respect to the handling of the data 
output command. For example, the commands other than the data 
output command are handled so as to executes the display of the 
copyright information or the like (step S2.1) in the cases of the 
embodiments 1 and 2. In the embodiment 3, in the case of the data 
definition command (for example, SDO command), data is written into 
the definition table as shown in Fig. 16. That is, the definition table is 
formed again every predetermined data unit such as an input buffer. 
The code substitution command (for example, CS command) is handled 
as a data output command ("Y" in step S18 in Fig. 9). That is, the data 
definition command corresponding to the designated order (M) as a 
parameter of the code substitution command is referred to in the 
definition table and the defined character string is outputted. In this 
case, the valid/invalid flag of the designated order M is referred to and 
if it is invalid, a decoder error occurs. 

In the embodiment 3, the designated order M is determined 
on the basis of the frequencies of appearance, as a reference, of a 
plurality of pointers which designate the same character string and 
whose offsets are different. However, other references can be used. 
For example, there is a method whereby a product of the number of 
character strings and the number of pointers is considered as a total 
amount of data and it is used as a reference. However, ordinarily, 
since the shorter the data is, the higher the frequency of appearance is, 
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a case where if such a reference is used, the degree of contribution to 
the improvement of the compression ratio is small occurs. There is 
also a method whereby a product of the number of codes of the 
character strings and the number of pointers is considered as a total 
amount of codes and it is used as a reference. However, it is 
considered that when the number of pointers is small, the compressing 
effect is also small. 
<Effects> 

As mentioned above, according to the embodiment 3, in the 
case where the character string designated by the specific pointer 
coincides with the character strings designated by a plurality of other 
pointers, the process for replacing the specific pointer with the 
definition command and replacing the plurality of other pointers with 
the code substitution commands corresponding to the definition 
command is executed every predetermined data unit such as an input 
buffer. Therefore, the short code can be dynamically assigned to the 
character string which actually and frequently appears. Thus, the 
local structure of the data structure can be accurately grasped and the 
compression ratio can be further improved. 

According to the embodiment 3, in the case where each of a 
plurality of commands is constructed by the kind of command and the 
operand, the encoding for gradually increasing the code length in 
accordance with the kind of each command or the parameter of each 
operand. Therefore, the short code can be assigned to the command 
which appears frequently, so that the compression ratio can be further 
improved. 

According to the embodiment 3, among the character 



strings existing in the predetermined data unit such as an input buffer 
or the like, the character string which appeared first is selected as a 
character string designated by the specific pointer. Therefore, the 
substitution to the definition command or the substitution to the code 
substitution command can be easily performed. 

According to the embodiment 3, if a plurality of definition 
commands are set, the frequencies of appearance of a plurality of 
character strings instructed by the pointers replaced with those 
definition commands in the predetermined data unit such as an input 
buffer are counted and the definition table in which the definition 
commands have been disclosed is formed in order of the frequencies of 
appearance. Therefore, the short code can be dynamically assigned to 
the character string which actually and frequently appears. Thus, the 
compression ratio can be further improved. 

According to the embodiment 3, whether the number of 
characters of the character string designated by the specific pointer 
coincides with that of the character string designated by another 
pointer or not is discriminated. Whether the difference between the 
address of the character string designated by the specific pointer in the 
predetermined data unit such as an input buffer and that of the 
character string designated by the other pointer coincides with the 
difference between the offset value which the specific pointer has and 
that which the other pointer has or not is discriminated. If they 
coincide, it is determined that the character string designated by the 
specific pointer and the character string designated by the other 
pointer coincide. Therefore, whether the character strings designated 
by the different pointers coincide or not can be easily discriminated. 



Although the character data and the pointer according to 
the conventional LZSS or the like have been used as intermediate data 
12 which is inputted to the substituting means 2 in each of the above 
embodiments, the compression method of the compressing apparatus 1 
is not limited to the LZSS. Arbitrary data can be used as intermediate 
data 12 so long as the data comprising the raw data such as character 
data or the like and the pointer which is inputted to the substituting 
means 2. 

According to the invention as mentioned above, when the 
data string as a target of the command coincides with the data string as 
a target of the raw data and the data string indicated by the pointer or 
a combination of them, the command is outputted in place of the raw 
data or the pointer. Therefore, for example, various information such 
as copyright information and password can be also easily embedded and 
the free compression can be made. It is also possible to contribute to 
the improvement of the compression ratio. 

The present invention is not limited to the foregoing 
embodiments but many modifications and variations are possible 
within the spirit and scope of the appended claims of the invention. 
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